{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# set tf 1.x for colab\n",
    "%tensorflow_version 1.x"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "# MNIST digits classification with TensorFlow"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"images/mnist_sample.png\" style=\"width:30%\">"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from sklearn.metrics import accuracy_score\n",
    "from matplotlib import pyplot as plt\n",
    "%matplotlib inline\n",
    "import tensorflow as tf\n",
    "print(\"We're using TF\", tf.__version__)\n",
    "\n",
    "import sys\n",
    "sys.path.append(\"../..\")\n",
    "import grading\n",
    "\n",
    "import matplotlib_utils\n",
    "from importlib import reload\n",
    "reload(matplotlib_utils)\n",
    "\n",
    "import grading_utils\n",
    "reload(grading_utils)\n",
    "\n",
    "import keras_utils\n",
    "from keras_utils import reset_tf_session"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fill in your Coursera token and email\n",
    "To successfully submit your answers to our grader, please fill in your Coursera submission token and email"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "grader = grading.Grader(assignment_key=\"XtD7ho3TEeiHQBLWejjYAA\", \n",
    "                        all_parts=[\"9XaAS\", \"vmogZ\", \"RMv95\", \"i8bgs\", \"rE763\"])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# token expires every 30 min\n",
    "COURSERA_TOKEN = \"### YOUR TOKEN HERE ###\"\n",
    "COURSERA_EMAIL = \"### YOUR EMAIL HERE ###\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Look at the data\n",
    "\n",
    "In this task we have 50000 28x28 images of digits from 0 to 9.\n",
    "We will train a classifier on this data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import preprocessed_mnist\n",
    "X_train, y_train, X_val, y_val, X_test, y_test = preprocessed_mnist.load_dataset()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# X contains rgb values divided by 255\n",
    "print(\"X_train [shape %s] sample patch:\\n\" % (str(X_train.shape)), X_train[1, 15:20, 5:10])\n",
    "print(\"A closeup of a sample patch:\")\n",
    "plt.imshow(X_train[1, 15:20, 5:10], cmap=\"Greys\")\n",
    "plt.show()\n",
    "print(\"And the whole sample:\")\n",
    "plt.imshow(X_train[1], cmap=\"Greys\")\n",
    "plt.show()\n",
    "print(\"y_train [shape %s] 10 samples:\\n\" % (str(y_train.shape)), y_train[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Linear model\n",
    "\n",
    "Your task is to train a linear classifier $\\vec{x} \\rightarrow y$ with SGD using TensorFlow.\n",
    "\n",
    "You will need to calculate a logit (a linear transformation) $z_k$ for each class: \n",
    "$$z_k = \\vec{x} \\cdot \\vec{w_k} + b_k \\quad k = 0..9$$\n",
    "\n",
    "And transform logits $z_k$ to valid probabilities $p_k$ with softmax: \n",
    "$$p_k = \\frac{e^{z_k}}{\\sum_{i=0}^{9}{e^{z_i}}} \\quad k = 0..9$$\n",
    "\n",
    "We will use a cross-entropy loss to train our multi-class classifier:\n",
    "$$\\text{cross-entropy}(y, p) = -\\sum_{k=0}^{9}{\\log(p_k)[y = k]}$$ \n",
    "\n",
    "where \n",
    "$$\n",
    "[x]=\\begin{cases}\n",
    "       1, \\quad \\text{if $x$ is true} \\\\\n",
    "       0, \\quad \\text{otherwise}\n",
    "    \\end{cases}\n",
    "$$\n",
    "\n",
    "Cross-entropy minimization pushes $p_k$ close to 1 when $y = k$, which is what we want.\n",
    "\n",
    "Here's the plan:\n",
    "* Flatten the images (28x28 -> 784) with `X_train.reshape((X_train.shape[0], -1))` to simplify our linear model implementation\n",
    "* Use a matrix placeholder for flattened `X_train`\n",
    "* Convert `y_train` to one-hot encoded vectors that are needed for cross-entropy\n",
    "* Use a shared variable `W` for all weights (a column $\\vec{w_k}$ per class) and `b` for all biases.\n",
    "* Aim for ~0.93 validation accuracy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_train_flat = X_train.reshape((X_train.shape[0], -1))\n",
    "print(X_train_flat.shape)\n",
    "\n",
    "X_val_flat = X_val.reshape((X_val.shape[0], -1))\n",
    "print(X_val_flat.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import keras\n",
    "\n",
    "y_train_oh = keras.utils.to_categorical(y_train, 10)\n",
    "y_val_oh = keras.utils.to_categorical(y_val, 10)\n",
    "\n",
    "print(y_train_oh.shape)\n",
    "print(y_train_oh[:3], y_train[:3])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# run this again if you remake your graph\n",
    "s = reset_tf_session()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model parameters: W and b\n",
    "W = ### YOUR CODE HERE ### tf.get_variable(...) with shape[0] = 784\n",
    "b = ### YOUR CODE HERE ### tf.get_variable(...)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Placeholders for the input data\n",
    "input_X = ### YOUR CODE HERE ### tf.placeholder(...) for flat X with shape[0] = None for any batch size\n",
    "input_y = ### YOUR CODE HERE ### tf.placeholder(...) for one-hot encoded true labels"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute predictions\n",
    "logits = ### YOUR CODE HERE ### logits for input_X, resulting shape should be [input_X.shape[0], 10]\n",
    "probas = ### YOUR CODE HERE ### apply tf.nn.softmax to logits\n",
    "classes = ### YOUR CODE HERE ### apply tf.argmax to find a class index with highest probability\n",
    "\n",
    "# Loss should be a scalar number: average loss over all the objects with tf.reduce_mean().\n",
    "# Use tf.nn.softmax_cross_entropy_with_logits on top of one-hot encoded input_y and logits.\n",
    "# It is identical to calculating cross-entropy on top of probas, but is more numerically friendly (read the docs).\n",
    "loss = ### YOUR CODE HERE ### cross-entropy loss\n",
    "\n",
    "# Use a default tf.train.AdamOptimizer to get an SGD step\n",
    "step = ### YOUR CODE HERE ### optimizer step that minimizes the loss"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "s.run(tf.global_variables_initializer())\n",
    "\n",
    "BATCH_SIZE = 512\n",
    "EPOCHS = 40\n",
    "\n",
    "# for logging the progress right here in Jupyter (for those who don't have TensorBoard)\n",
    "simpleTrainingCurves = matplotlib_utils.SimpleTrainingCurves(\"cross-entropy\", \"accuracy\")\n",
    "\n",
    "for epoch in range(EPOCHS):  # we finish an epoch when we've looked at all training samples\n",
    "    \n",
    "    batch_losses = []\n",
    "    for batch_start in range(0, X_train_flat.shape[0], BATCH_SIZE):  # data is already shuffled\n",
    "        _, batch_loss = s.run([step, loss], {input_X: X_train_flat[batch_start:batch_start+BATCH_SIZE], \n",
    "                                             input_y: y_train_oh[batch_start:batch_start+BATCH_SIZE]})\n",
    "        # collect batch losses, this is almost free as we need a forward pass for backprop anyway\n",
    "        batch_losses.append(batch_loss)\n",
    "\n",
    "    train_loss = np.mean(batch_losses)\n",
    "    val_loss = s.run(loss, {input_X: X_val_flat, input_y: y_val_oh})  # this part is usually small\n",
    "    train_accuracy = accuracy_score(y_train, s.run(classes, {input_X: X_train_flat}))  # this is slow and usually skipped\n",
    "    valid_accuracy = accuracy_score(y_val, s.run(classes, {input_X: X_val_flat}))  \n",
    "    simpleTrainingCurves.add(train_loss, val_loss, train_accuracy, valid_accuracy)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Submit a linear model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## GRADED PART, DO NOT CHANGE!\n",
    "# Testing shapes \n",
    "grader.set_answer(\"9XaAS\", grading_utils.get_tensors_shapes_string([W, b, input_X, input_y, logits, probas, classes]))\n",
    "# Validation loss\n",
    "grader.set_answer(\"vmogZ\", s.run(loss, {input_X: X_val_flat, input_y: y_val_oh}))\n",
    "# Validation accuracy\n",
    "grader.set_answer(\"RMv95\", accuracy_score(y_val, s.run(classes, {input_X: X_val_flat})))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you can make submission with answers so far to check yourself at this stage\n",
    "grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MLP with hidden layers"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Previously we've coded a dense layer with matrix multiplication by hand. \n",
    "But this is not convenient, you have to create a lot of variables and your code becomes a mess. \n",
    "In TensorFlow there's an easier way to make a dense layer:\n",
    "```python\n",
    "hidden1 = tf.layers.dense(inputs, 256, activation=tf.nn.sigmoid)\n",
    "```\n",
    "\n",
    "That will create all the necessary variables automatically.\n",
    "Here you can also choose an activation function (remember that we need it for a hidden layer!).\n",
    "\n",
    "Now define the MLP with 2 hidden layers and restart training with the cell above.\n",
    "\n",
    "You're aiming for ~0.97 validation accuracy here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# write the code here to get a new `step` operation and then run the cell with training loop above.\n",
    "# name your variables in the same way (e.g. logits, probas, classes, etc) for safety.\n",
    "### YOUR CODE HERE ###"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Submit the MLP with 2 hidden layers\n",
    "Run these cells after training the MLP with 2 hidden layers"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "## GRADED PART, DO NOT CHANGE!\n",
    "# Validation loss for MLP\n",
    "grader.set_answer(\"i8bgs\", s.run(loss, {input_X: X_val_flat, input_y: y_val_oh}))\n",
    "# Validation accuracy for MLP\n",
    "grader.set_answer(\"rE763\", accuracy_score(y_val, s.run(classes, {input_X: X_val_flat})))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# you can make submission with answers so far to check yourself at this stage\n",
    "grader.submit(COURSERA_EMAIL, COURSERA_TOKEN)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
